Bimodal speech recognition using coupled hidden Markov models
نویسندگان
چکیده
In this paper we present a bimodal speech recognition system in which the audio and visual modalities are modeled and integrated using coupled hidden Markov models (CHMMs). CHMMs are probabilistic inference graphs that have hidden Markov models as sub-graphs. Chains in the corresponding inference graph are coupled through matrices of conditional probabilities modeling temporal influences between their hidden state variables. The coupling probabilities are both cross chain and cross time. The later is essential for allowing temporal influences between chains, which is important in modeling bimodal speech. Our bimodal speech recognition system employs a two-chain CHMM, with one chain being associated with the acoustic observations, the other with the visual features. A deterministic approximation for maximum a posteriori (MAP) estimation is used to enable fast classification and parameter estimation. We evaluated the system on a speaker independent connected-digit task. Comparing with an acoustic-only ASR system trained using only the audio channel of the same database, the bimodal system consistently demonstrates improved noise robustness at all SNRs. We further compare the CHMM system reported in this paper with our earlier bimodal speech recognition system in which the two modalities are fused by concatenating the audio and visual features. The recognition results clearly show the advantages of the CHMM framework in the context of bimodal speech recognition.
منابع مشابه
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Speaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
Medium vocabulary continuous audio-visual speech recognition
This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.
متن کاملMedium Vocabulary Continu Speech Recognit
This paper presents our experiments on continuous audiovisual speech recognition. A number of bimodal systems using feature fusion or fusion within Hidden Markov Models are implemented. Experiments with different fusion techniques and their results are presented. Further the performance levels of the bimodal system and a unimodal speech recognizer under noisy conditions are compared.
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000